Understanding of Navy Technical Language via Statistical Parsing
نویسنده
چکیده
A key problem in indexing technical information is the interpretation of technical words and word senses, expressions not used in everyday language. This is important for captions on technical images, whose often pithy descriptions can be valuable to decipher. We describe the natural-language processing for MARIE-2, a natural-language information retrieval system for multimedia captions. Our approach is to provide general tools for lexicon enhancement with the specialized words and word senses, and to learn word usage information (both on word senses and word-sense pairs) from a training corpus with a statistical parser. Innovations of our approach are in statistical inheritance of binary co-occurrence probabilities and in weighting of sentence subsequences. MARIE-2 was trained and tested on 616 captions (with 1009 distinct sentences) from the photograph library of a Navy laboratory. The captions had extensive nominal compounds, code phrases, abbreviations, and acronyms, but few verbs, abstract nouns, conjunctions, and pronouns. Experimental results fit a processing time in seconds of 876 . 2 0858 . 0 n and a number of tries before finding the best interpretation of 668 . 1 809 . 1 n where n is the number of words in the sentence. Use of statistics from previous parses definitely helped in reparsing the same sentences, helped accuracy in parsing of new sentences, and did not hurt time to parse new sentences. Word-sense statistics helped dramatically; statistics on word-sense pairs generally helped but not always.
منابع مشابه
Active Knowledge Structures in Natural Language Understanding
We view the task of the robust understanding of messages from texts and discourse as the use of extraction of gists from a noisy background by using techniques of (a) the recursive computation of agent's points of view of each others' environments, beliefs, expertise etc. and (b) representations of those beliefs and expertise as networks which are obtained by "best-fit" methods against stored k...
متن کاملPROGRESS REPORT: Active Knowledge Structures in Natural Language Understanding
In the case of the other semantics-based parser PM, Jerry Ball took two of the texts from the Navy message database and added the vocabulary from those messages to the parser's lexicon. After a small amount of modification, the parser was able to parse about 80% of the sentences in those two messages into reasonable representations. With some additional work this percentage can be improved. Giv...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملThe Rise of Statistical Parsing
The effectiveness of statistical parsing has almost completely overshadowed the previous dependence on rule-based parsing. Statistically learning how to parse sentences from sample data appeals strongly as the “right” approach for both practical (performance) and theoretical (human-likeness) reasons. Benefitting from, and contributing to, the advancement in machine learning, state-of-theart sta...
متن کاملUnderstanding Complex Natural Language Explanations In Tutorial Applications
We describe the WHY2-ATLAS intelligent tutoring system for qualitative physics that interacts with students via natural language dialogue. We focus on the issue of analyzing and responding to multisentential explanations. We explore an approach that combines a statistical classifier, multiple semantic parsers and a formal reasoner for achieving a deeper understanding of these explanations in or...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004